Executive Summary

<<<<<<< HEAD =======

This report enables NESO to better understand and forecast long-term electricity demand. Using ordinary least squares, we identified key temporal and weather based predictors that influence average daily demand. Valuing simplicity, we chose our model both for practical use and predictive ability.

>>>>>>> origin/main

Data and Exploratory Analysis

Prior to fitting any models we examine the dataset used in our analysis. We consider the following predictor variables for our linear model:

<<<<<<< HEAD

We visually inspected scatter plots of each predictor against demand to correlations and potential anomalies.

Wind generation showed no clear correlation with demand and was therefore excluded from further modelling as it provides no predictive value. In contrast, solar generation exhibited an inverse relationship with demand. This aligns with expectations, as increased sunlight reduces the need for lighting, lowering electricity demand.

Temperature variables all showed a negative correlation with demand. This is intuitive since as warmer conditions generally reduce heating needs. TE shows the strongest negative effect and is considered the best temperature measure because it incorporates a lagged effect that may reflect customers updating heating settings with a delay.

The DSN variable follows a quadratic distribution, but has a pronounced dip in December. This suggests that an interaction between month and DSN should be considered in the model.

Month has a clear effect on demand, with the highest levels observes in January and lowest in November and March. Day of the week is an important predictor to include in the model as demand is lower on weekends, highest on weekdays. However, combining weekdays and weekends reduce the resolution of important … differences. Year also has an impact on demand and will be included as a factor variable. However, this will be stripped out of the model by NESO, who will add their own year effect for future years.

=======

The figures and observations from our data exploration are recorded below.

>>>>>>> origin/main

Model fitting and cross-validation results

Metrics & Linear Models

We aim to fit a ordinary least-squares linear regression model of the form \[\mathbf{y} = X \boldsymbol{\beta} + \boldsymbol{\epsilon}\] assuming

  • Linearity of model parameters.
  • The errors, \(\boldsymbol{\epsilon}\), are normally distributed.
  • The errors are independent.
  • The errors have constant variance (homoscedasticity).

Metrics

Variable significance is assessed using hypothesis test on model parameters, where a high p-value indicates no statistically significant effect in the model. The \(R^2\) metric measures the proportion of variance explained by the linear model. Models with a higher adjusted \(R^2\) are preferable over models with a lower one, as this indicates that the model explains a higher proportion of the variability within the data. Predictive accuracy is assessed through root mean squared error (RMSE). It measures the average magnitude of the residuals, with lower values indicating that the model’s predictions are closer to the observed data. The AIC is used to compare the quality of a model balancing model fit with complexity by penalising the number of parameters included in the model. A model with the lowest AIC is generally preferred, as it explains the data well without including unnecessary parameters.

Model selection

Based on the exploratory data analysis, the final model was selected to capture the most important predictors and accounting for non-linearities and interactions while balancing simplicity:

\[\begin{eqnarray*} M_F: y_i = \beta_0 + \beta_1 \text{solar}_i + \beta_2 \text{wdaynumber}_i + \beta_3 \text{month}_i + \beta_4 \text{year}_i + \beta_5 \text{DSN}^2_i + \beta_6 \text{TE}_i + \beta_7 \, <<<<<<< HEAD \text{DSN}^2\text{:month} + \epsilon_i ======= \text{DSN_i}^2\text{:month_i} + \epsilon_i >>>>>>> origin/main \end{eqnarray*}\] where \(\epsilon_i \overset{\mathrm{iid}}\sim \mathcal{N}(0, \sigma^2)\).

The TE variable was selected as the temperature measure because it incorporates information from both the current and previous day, capturing lagged effects in demand. When compared to alternatives, the model using TE produced a higher \(R^2\), lower RMSE and lower AIC indicating <<<<<<< HEAD a better overall model fit and predictive performance.

======= a better overall model fit and predictive performance. The DSN variable exhibited a quadratic relationship with demand, so \(DSN^2\) was included. Month and Year were included as factor variables to account for annual variation. A distinct dip in demand in December motivated the inclusion of an interaction between DSN^2 and Month, allowing the model to capture monthly variations in the quadratic effect and substantially improving the AIC (decrease of 1,846.95).

>>>>>>> origin/main
Model Performance Comparison
Model \(R^2\) Adjusted \(R^2\) RMSE AIC
Model with temp 0.8585733 0.8569565 1533.105 62062.24
Model with TO 0.8478952 0.8461564 1589.928 62319.91
Model with TE 0.8696681 0.8681782 1471.741 61773.03
<<<<<<< HEAD

Solar generation was included as higher solar output was associated with lower average demand and remained statistically significant in the final model, (\(p<0.01\)). Day-of-the week effects were modelled as a factor to capture the reduction weekend demand and small variations across weekdays, with all factor levels highly significant, (\(p<0.01\)). The DSN (days since November) variable exhibited a quadratic relationship with demand, so \(DSN^2\) was included to capture this non-linearity. Month and Year were included as factor variables to account for seasonal and annual variation. A distinct dip in demand in December motivated the inclusion of an interaction between DSN^2 and Month, allowing the model to capture monthly variations in the quadratic effect and substantially improving the AIC (decrease of 1,846.95).

=======

The remaining variables were included due to trends seen in exploratory analysis and as they are statistically significant with most p-values lower than \(p<0.01\).

>>>>>>> origin/main
Table 1: Regression Results
Estimate Std. Error t value Pr(>|t|)
(Intercept) 33526.962 321.157 104.394 0.000
TE -529.501 10.654 -49.700 0.000
solar_sarah -15846.613 1141.001 -13.888 0.000
factor(WeekdayNum)1 5596.396 93.053 60.142 0.000
factor(WeekdayNum)2 6324.780 93.121 67.920 0.000
factor(WeekdayNum)3 6360.136 93.097 68.317 0.000
factor(WeekdayNum)4 6279.360 93.250 67.339 0.000
factor(WeekdayNum)5 5464.630 93.218 58.622 0.000
factor(WeekdayNum)6 1169.326 93.079 12.563 0.000
I(DSN^2) 0.660 0.041 16.114 0.000
factor(Month)2 4825.442 455.153 10.602 0.000
factor(Month)3 5306.641 496.149 10.696 0.000
factor(Month)11 3286.746 266.867 12.316 0.000
factor(Month)12 9149.686 289.650 31.589 0.000
factor(Year)1992 -34.595 226.488 -0.153 0.879
factor(Year)1993 -26.648 226.925 -0.117 0.907
factor(Year)1994 453.790 227.098 1.998 0.046
factor(Year)1995 1502.790 226.743 6.628 0.000
factor(Year)1996 2365.578 227.285 10.408 0.000
factor(Year)1997 2804.758 227.065 12.352 0.000
factor(Year)1998 3134.583 227.123 13.801 0.000
factor(Year)1999 3779.208 226.878 16.657 0.000
factor(Year)2000 4435.593 226.786 19.558 0.000
factor(Year)2001 5206.297 226.702 22.965 0.000
factor(Year)2002 5574.093 227.401 24.512 0.000
factor(Year)2003 6347.567 226.968 27.967 0.000
factor(Year)2004 6576.452 226.678 29.012 0.000
factor(Year)2005 5704.218 226.901 25.140 0.000
factor(Year)2006 5376.686 226.954 23.691 0.000
factor(Year)2007 4761.861 227.234 20.956 0.000
factor(Year)2008 4100.558 226.620 18.094 0.000
factor(Year)2009 2600.179 226.727 11.468 0.000
factor(Year)2010 2864.807 228.077 12.561 0.000
factor(Year)2011 2258.601 227.157 9.943 0.000
factor(Year)2012 1629.380 226.662 7.189 0.000
factor(Year)2013 1119.108 227.056 4.929 0.000
factor(Year)2014 488.905 227.196 2.152 0.031
I(DSN^2):factor(Month)2 -0.708 0.053 -13.357 0.000
I(DSN^2):factor(Month)3 -0.728 0.047 -15.418 0.000
I(DSN^2):factor(Month)11 0.502 0.225 2.231 0.026
I(DSN^2):factor(Month)12 -3.808 0.079 -48.028 0.000

Evaluation of Model

<<<<<<< HEAD

To assess model performance, we first compare the \(R^2\), adjusted \(R^2\), AIC and RMSE of the final model to alternatives. In particular, we compare against

Model with no interactions: \[y_i = \beta_0 + \beta_1 solar_i + \beta_2 wdaynumber_i + \beta_3month_i + \beta_4 year_i + \beta_5 DSN^2_i + \beta_6 TE_i + \epsilon_i\] and Model with all interactions: \[y_i = \beta_0 + \beta_1 solar_i + \beta_2 wdaynumber_i + \beta_3month_i + \beta_4 year_i + \beta_5 DSN^2_i + \beta_6 TE_i + \beta_7 TE:solar + \beta_8 TE:month + \beta_9 solar:month + \beta_{10} wdaynumber:month + \beta_{11} DSN^2:month + \epsilon_i\]

Here we can see that whilst the model with all interactions had the higher \(R^2\), higher adjusted \(R^2\) and lower RMSE, our final model with only one interaction term is still significantly better than the no interactions model and thats why we chose it as explained above. [Make this paragraph actually different from above maybe comparing the scores more between different models]

To continue checking our model’s performance we now examine the residuals to see if the underlying assumptions of the model are satisfied. This will help see whether the model provides a valid fit for our data.

The Residuals vs Fitted plot shows little change of variance with mean, so the constant variance assumption is upheld. The Q-Q Residuals plot the ordered standardised residuals against quantiles of a standard normal. The plot shows a slight deviation from normality with an S-shaped pattern which would suggest skewness in the residuals. However this impact is most likely due to sample size so by the central limit theorem, the sampling distribution will tend to normality even with this deviation. The values on the Scale-Location plot shows very slight heteroscedasticity (the variance of errors is not constant) because the red line has a very minor curve to it. However this won’t be a major problem to our model as we have a larger sample size and look to predict using our model and not use it for inference. Finally, the residuals vs leverage plot gives us Cook’s distance, which measures the change in all model fitted values on omission of the data point in question. Here, it is not problematic as none of our points either reach Cook’s distance or have a high leverage so we can conclude that there are no outliers that need investigating for our model.

=======

We examine the residuals to check whether the underlying assumptions are satisfied, ensuring the model provides a valid fit to the data.

The Residuals vs Fitted plot shows that the variance remains fairly constant, supporting the homoscedacity assumption. The Q-Q plot shows a slight deviation from normality with an S-shaped pattern suggesting slight skewness. However, given the large sample size, the central limit theorem ensures the sampling distribution remains approximately normal. The Scale-Location plot shows minimal heteroscedasticity with only a slight curve in the line. This is not concerning given the large sample size and predictive focus. Finally, the residuals vs leverage plot indicates no problematic points, as none exhibit high leverage or Cook’s distance suggesting no outliers.

>>>>>>> origin/main

Cross Validation:

Methodology

To assess model generalizability, we used expanding window cross-validation. Since, our model includes year as a factor, which would be updated by NESO annually based on their estimated year effect, we altered our cross validation method as follows:

Firstly we fit the model on an initial training set from 1991 up to 2000. Then the year effect is removed from the trained model. Using this, we compute the ‘yearless predictions’ for 2001. To mimic adding back in a year effect we assumed a known year effect, using the coefficient estimated from the full dataset. This was added to the ‘yearless predictions’ to compute our final predictions. We then compare these with the observed data for 2001 and obtain the following metrics:

  • Squared error: \(\text{SE} = (y - \hat{y}_F)^2\)

  • Interval score: \(\text{IS}(\alpha) = U_F - L_F + \frac{2}{\alpha} (L_F - y) \mathcal{1}\{y \leq L_F\} + \frac{2}{\alpha} (y - U_F) \mathcal{1}\{y > L_F\}\) where \(L_F\) and \(U_F\) denote the lower and upper bound of the prediction interval with coverage probability \(\alpha\).

  • Dawid-Sebastiani: \(\text{DS} = \frac{(y - \hat{y}_F)^2}{\sigma^2_F} + \log(\sigma^2_F)\)

We then repeat this process, expanding the training set by one year and moving the test set forward one year.

Results

The observed vs predicted values from our cross validation are <<<<<<< HEAD plotted in Figure 2. The points lie scattered around the \(y=x\) line indicating accuracy in predictions. It is important to note that there are some points that stray above the \(y=x\) line. However these occur at lower demand levels. Since NESO are particularly interested in accuracy at higher demand levels where shortfalls are likely to occur, this indicates that the model still remains well-suited for accurately estimating average daily demand, especially at high levels.

Using cross validation we find the following predictive scores.

======= plotted in Figure 2. The points lie scattered around the \(y=x\) line indicating accurate predictions. A few points lie above the line, however these occur at lower demand levels. Since NESO prioritises accuracy at higher demand levels where shortfalls are more likely to occur, the model remains well-suited for estimating average daily demand, particularly during periods of high demand.

Using cross validation we find the following predictive scores for our final model and a basic model:

>>>>>>> origin/main
Predictive Scores for Final and Baseline Model
Model RMSE Mean Dawid Sebastiani Score Mean Interval Score
\(M_F\) 1462.544 15.58537 9085.693
\(M_{B}\) 1872.626 16.09661 12492.786

Each predictive score for the final model is lower for \(M_F\), suggesting higher accuracy in predictions as well as confidence associated with these predictions.

To evaluate potential overfitting, we compare the RMSE based on the full dataset with the RMSE from the Cross Validation:

How the model performs on trained data vs new data
Fitted RMSE Cross Validation RMSE
1471.741 1462.544
<<<<<<< HEAD

The fitted RMSE is slightly lower than the cross-validation RMSE, likely due to smaller estimation and prediction sets in cross-validation. However, the difference is small relative to the data scale and not significant, suggesting the model does not perform significantly better on the data it was trained on compared to new data. Therefore we conclude the final model is not overfitting.

=======

To see that the final model is not overfitting observe that the fitted RMSE is greater than the Cross Validation RMSE. This is unusual however could be explained due to the averaging over the 14 test sets. Also note the difference is minor relative to the scale of the data, indicating that the model performs consistently on both training and new data.

Next consider limitations to our method: by assuming the known year effect from the full model, we leak information about the test year into the predictions. We still choose this method for two reasons: Firstly, it still minimises the disruption to the chronological structure of the data. Secondly, expanding the training set at each step, incorporates more historical data helping the model capture long-term, recurring monthly and weekly trends more effectively.

>>>>>>> origin/main